The World - Wide Web : Quagmire or Gold Mine ? Is information on the Web sufficiently structured to facilitate effective Web mining ?

ثبت نشده

چکیده

Skeptics believe the Web is too unstructured for Web mining to succeed. Indeed, data mining has been applied traditionally to databases, yet much of the information on the Web lies buried in documents designed for human consumption such as home pages or product catalogs. Furthermore , much of the information on the Web is presented in natural-language text with no machine-readable semantics ; HTML annotations structure the display of Web pages, but provide little insight into their content. Some have advocated transforming the Web into a massive layered database to facilitate data mining [12], but the Web is too dynamic and chaotic to be tamed in this manner. Others have attempted to hand code site-specific " wrappers " that facilitate the extraction of information from individual Web resources (e.g., [8]). Hand coding is convenient but cannot keep up with the explosive growth of the Web. As an alternative, this article argues for the structured Web hypothesis: Information on the Web is sufficiently structured to facilitate effective Web mining. Examples of Web structure include linguistic and typographic conventions , HTML annotations (e.g., ), classes of semi-structured documents (e.g., product catalogs), Web indices and directories, and much more. To support the structured Web hypothesis, this article will survey preliminary Web mining successes and suggest directions for future work. Web mining may be organized into the following subtasks: • Resource discovery. Locating unfamiliar documents and services on the Web. Is information on the Web sufficiently structured to facilitate effective Web mining?

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Technique for Improving Web Mining using Enhanced Genetic Algorithm

World Wide Web is growing at a very fast pace and makes a lot of information available to the public. Search engines used conventional methods to retrieve information on the Web; however, the search results of these engines are still able to be refined and their accuracy is not high enough. One of the methods for web mining is evolutionary algorithms which search according to the user interests...

متن کامل

Presenting a method for extracting structured domain-dependent information from Farsi Web pages

Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...

متن کامل

Prediction of user's trustworthiness in web-based social networks via text mining

In Social networks, users need a proper estimation of trust in others to be able to initialize reliable relationships. Some trust evaluation mechanisms have been offered, which use direct ratings to calculate or propagate trust values. However, in some web-based social networks where users only have binary relationships, there is no direct rating available. Therefore, a new method is required t...

متن کامل

Construction of Web-Based, Service-Oriented Information Networks: A Data Mining Perspective - (Abstract)

Mining directly on the existing networks formed by explicit webpage links on the World-Wide Web may not be so fruitful due to the diversity and semantic heterogeneity of such web-links. However, construction of service-oriented, semi-structured information networks from the Web and mining on such networks may lead to many exciting discoveries of useful information on the Web. This talk will dis...

متن کامل

Adaptive Information Analysis in Higher Education Institutes

Information integration plays an important role in academic environments since it provides a comprehensive view of education data and enables mangers to analyze and evaluate the effectiveness of education processes. However, the problem in the traditional information integration is the lack of personalization due to weak information resource or unavailability of analysis functionality. In this ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 1996

The World - Wide Web : Quagmire or Gold Mine ? Is information on the Web sufficiently structured to facilitate effective Web mining ?

ثبت نشده

چکیده

منابع مشابه

A Technique for Improving Web Mining using Enhanced Genetic Algorithm

Presenting a method for extracting structured domain-dependent information from Farsi Web pages

Prediction of user's trustworthiness in web-based social networks via text mining

Construction of Web-Based, Service-Oriented Information Networks: A Data Mining Perspective - (Abstract)

Adaptive Information Analysis in Higher Education Institutes

عنوان ژورنال:

اشتراک گذاری